num is the number of data units processed by a single instruction, lat is the instruction latency (the number of cycles after which the result is ready) and thru is the instruction throughput (the number of cycles after which another instruction can be issued).
SSE | num | lat | thru | 3DNow! | num | lat | thru
| addps | subps maxps minps cmpps 4 | 4 | 2 | pfadd | pfsub* pfmin pfmax pfcmp* 2 | 4 | 1
| addss | subss maxss minss cmpss 1 | 3 | 1 | -
| mulps | 4 | 5 | 2 | pfmul | 2 | 4 | 1
| mulss | 1 | 4 | 1 | -
| rcpps | rcqrtps 4 | 2 | 2 | pfrcp | pfsqrt 2 | 4 | 1
|
MMX-mult | | 3 | 1 | MMX-mult | | 4 | 1
| other MMX | | 1 | 1 | other MMX | | 2 | 1
| |
Note that practically the result of a 3DNow! instruction can be fetched by another instruction after 3 cycles, even though it is ready after 4 cycles.